Main
Anderson Banihirwe
I contribute to and maintain several libraries within the open source scientific Python stack, particularly around improving scalability of Python tools in order to handle terabyte-scale datasets on HPC and cloud platforms.
Education
B.S., Computer Systems Engineering
University of Arkansas at Little Rock
Little Rock, AR
2018 - 2014
Professional Experience
Software Engineer ||
National Center for Atmospheric Research
Boulder, CO
present - 2020-10
- Created jupyter-forward, a Jupyter Lab port forwarding utility that simplifies running jupyter on remote resources.
- Served as a core developer of xarray, an open source library for working with multidimensional labeled datasets and arrays in Python.
Software Engineer |
National Center for Atmospheric Research
Boulder, CO
2020-9 - 2018-10
- Led the intake-ESM project, a Python data cataloguing package for exploring and ingesting earth system model data sets.
- Contributed to the core software stack powering the Pangeo Project. Some of the projects I contributed to include: xarray, dask.
- Assisted with the development and deployment of live (virtual or in-person) and online/self-paced education material.
Software Developer Intern
Quansight
Austin, TX
2018-09 - 2018-05
- Developed xndframes, a Pandas ExtensionDtype/Array backed by xnd, a container type that maps most Python values relevant for scientific computing directly to typed memory.
- Worked on integrating cuDF - GPU dataframe library with Apache Arrow library.
Data Science Intern
First Orion
Little Rock, AR
2018-04 - 2017-11
- Built scoring, predictive models with Scikit-learn, Dask, and Apache Spark using First Orion’s proprietary telecommunication data.
Research Intern
National Center for Atmospheric Research
Boulder, CO
2017-08 - 2017-05
- Developed spark-xarray, a Python package that integrates PySpark and xarray for climate data analysis.
Selected Publications, Posters, and Talks
Cloud-Native Repositories for Big Scientific Data
Computing in Science and Engineering
N/A
2020-11
- Authored with Ryan Abernathey, Tom Augspurger, et al.
Pangeo Benchmarking Analysis: Object Storage vs. POSIX File System
Fifth International Parallel Data Systems Workshop @ SC 20
N/A
2020-10
- Authored with Haiying Xu, Kevin Paul.
The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC
2019 Supercomputing Conference Workshop on Interactive High-Performance Computing
N/A
2020-01
- Authored with Tina Erica Odaka, Guillaume Eynard-Bontemps, Aurelien Ponte, Guillaume Maze, Kevin Paul, Jared Baker, Ryan Abernathey.
Pangeo Use Case: Analyzing Initialized Climate Prediction System Datasets with climpred
NOAA’s 45th Climate Diagnostics & Prediction Workshop
Online
2020-10
- Invited talk about climpred, a Python package for weather and climate forecasts.
Zarr: chunked, compressed, multidimensional arrays
2020 Cloud Native Geospatial Outreach Day
Online
2020-09
- Invited talk about Zarr, an open source data format for the storage of chunked, compressed, multidimensional arrays.
Intake-ESM – Making It Easier To Consume Climate and Weather Data
2020 ESIP Summer Meeting
Online
2020-07
- Invited talk about intake-esm, an intake plugin for working with Earth System Model (ESM) datasets.
Interactive Supercomputing with Dask and Jupyter
2019 Scientific Computing with Python conference
Austin, TX
2019-07
- Contributed talk about Dask and Jupyter.
Beyond Matplotlib - Tutorial: Building Interactive Climate Data Visualizations with Bokeh and Friends
2018 UCAR Software Engineering Assembly conference
Boulder, CO
2018-04
- Contributed tutorial about interactive visualization with Python.
PySpark for “Big” Atmospheric Data Analysis
Eighth Symposium on Advances in Modeling and Analysis Using Python
Austin, TX
2018-01
- Contributed talk about spark-xarray.